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Abstract 

We show that the sets in a family with finite VC dimension can be uniformly ap- 
proximated within a given error by a finite partition. Immediate corollaries include the 
fact that VC classes have finite bracketing numbers, satisfy uniform laws of averages 
under strong dependence, and exhibit uniform mixing. Our results are based on recent 
work concerning uniform laws of averages for VC classes under ergodic sampling. 
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1 Introduction 

Let X be a complete separable metric space with Borel sigma field S, and let C C <S be a 
family of measurable sets. For each finite set D C X, let {CnZ? : C € C} be the collection of 
subsets of D induced by the members of C. The family C is said to be a Vapnik-Chervonenkis 
(VC) class if there is a finite integer k such that 

\{C n D : C G C}\ < 2 k for every DCX with |D| = k. (1) 

Here and in what follows | • | denotes cardinality. The smallest k for which (pQ) holds is 
known as the VC-dimension of C. Classes of sets having finite VC-dimension play a central 
role in the theory of machine learning and empirical processes (c./. HI [5] ) . 

1.1 Principal Result 

Let \x be a probability measure on (X,S), and let n be a finite, measurable partition of X. 
For every set C S C, the 7r-boundary of C, denoted d(C : 7r), is the union of all the cells in 
7r that intersect both C and its complement with positive probability. Formally, 

d[C : vr) = U {^ S vr : /i(4 n C) > and fi(A C\C)> 0}. 

Note that d(C : n) depends on fi; this dependence is suppressed in our notation. Of interest 
here is the existence of a fixed finite partition ir such that the measure of the boundary 
d(C : 7t) is small for every set C in C. In general, the existence of a uniformly approximating 
partition depends on the family C and the measure [i. Our main result shows that VC classes 
possess this uniform approximation property, regardless of the measure \jl. 

Theorem 1. Let fi be a probability measure on (X,S). If C is a VC-class, then for every 
e > there exists a finite measurable partition ir of X such that 

sup fi(d(C : vr)) < e. (2) 

cec 

Several corollaries of Theorem Q] are discussed in the next section. The proof of Theorem 
[His presented in Section [3j 

2 Corollaries of Theorem [1] 

Here we present several immediate corollaries of Theorem Q] that may be of independent 
interest. 



2.1 Bracketing of VC Classes 

Let /jbea probability measure on (X, S). For each pair of sets A, B G <S, the bracket [A, B] 
consists of all those sets C C X such that i C C C B. If A is not a subset of B, then [A, B] 
is empty. The bracket [A, B] is said to be an e-bracket if /j,(B \ A) < e. The bracketing 
number Nn(e,C, jjl) of a family C C 5 is the least number of e-brackets needed to cover C. 
Note that the sets defining the minimal brackets need not be elements of C. 

Corollary 1. Let [i be any probability measure on (X,S). If C is a countable VC-class, 
then Nn (e, C, /i) is finite for every e > 0. 

Remark: Using routine arguments, the assumption that C is countable can be replaced 
by the weaker assumption that there exists a countable sub-family Co C C such that the 
indicator function of every set in C is the pointwise limit of the indicator functions of sets 
inC . 

Proof: Fix a probability measure fi and e > 0. Let ir = {A\, . . . , A m } be a finite measurable 
partition of X such that ([2]) holds, and assume without loss of generality that each set Aj 
has positive //-measure. Let Aj be an element of it. For each C G C, remove points in C 
from Aj if fJ.(Aj DC) = 0, and remove points in C c from Aj if /J,(Aj n C c ) = 0. Denote 
the resulting set by Bj. Clearly Bj C Aj and, as C is countable, n{Aj \ Bj) = 0. The 
definition of Bj ensures that for each C G C exactly one of the following relations holds: 
Bj C C, Bj C C c , or ii(Bj n C) ■ n{Bj D C c ) > 0. Let B = X \ Uf =1 Bj, and define 
the partition vr' = {B ,Bx, . . . ,B m }. Given C G C let Q = Ll{B £ it' : B C C} and 
C u = U{-B G 7r' : -B n C ^ 0}. A straightforward argument shows that C/ C C C C u , and 
that /i(C„ \ Cj) = /u(d(C : tt')) = m(5(C : vr)) < e. It follows that 8 = {[C h C u ] : C G C] is 
a collection of e-brackets covering C. The cardinality of is at most 2 2 ' 7r L 

2.2 Uniform Laws of Large Numbers 

Let X\,X2, ... be a stationary ergodic process taking values in (X, S) with Xi ~ /i. The er- 
godic theorem ensures that, for every C G S, the sample averages n~ l X^ILi -^c(^Q) converge 
with probability one to n(C). For VC classes and i.i.d. sequences {Xi} this convergence is 
known to be uniform over C [10J. Using Corollary Q] it is easy to show that this uniform 
convergence extends to ergodic processes as well. 

Theorem 2. IfC is a countable VC-class of sets and X±,X2, ■ ■ ■ G X is a stationary ergodic 



process with Xi ~ fi, then 

1 n 

i=l 

with probability one as n tends to infinity. 



sup 
C'ec 



Proof: This follows easily from Corollary Q] and the Blum DeHardt law of large numbers 
(c.f. [9]), which establishes that families with finite bracketing numbers have the Glivenko 
Cantelli property. 

The uniform strong law in Theorem [2] was established in [1J using arguments similar to 
those forTheorem [TJ Analogous uniform strong laws for VC major and VC graph classes are 
given in [1] , while [2] contains uniform strong laws for classes of functions having finite gap 
(fat shattering) dimension. See these papers for a discussion of earlier and related work. 

2.3 Uniform Mixing Conditions in Ergodic Theory 

Let T be an ergodic /i-measure preserving transformation of (X, S). T is said to be strongly 
mixing if for each pair A, B of measurable sets, lim ri _ ) . 00 /i( J 4nr _n i?) = fj,(A)^(B). Theorem 
[I] can be applied to show that strong mixing occurs uniformly over a countable VC class. 

Proposition 1. // C C S is a countable VC-class of measurable sets, and T is a strongly 
mixing transformation, then 

lim sup \fj,(AnT- n B)- n(A)fi(B)\ = 0. 
n-+oo ABeC 

Proof: Given e > 0, let tt be a finite partition such that sup CeC fi(d(C : it)) < e. Choose a 
natural number N such that for n > N and each pair D±, D% € n, 

\^{D x ^T- n D 2 ) - n{D 1 )n{D 2 )\ < en{Dx)n(D 2 ). 

For every measurable set A let ~A~ = U{D G it : fi(D n A) > 0} and A = U{D G it : D C A} 
be, respectively, upper and lower approximations of A derived from the cells of n. Note 
that if A, B are measurable sets satisfying A = A and B = B_, then 



[/*(AnT-»B)-/i(>i)/i(B)| = |£ ]T M^nT-^O-E E ^ D M D ') 

DCAD'CB DCAD'CB 

^ Y._Y,_\^ DnT ' nD ')-^ D )^ D ')\ 

DCAD'CB 

< E E £ MP)MP0 < en(A)(i(B) < e. 



DCAD'CB 



Suppose now that A, B are sets in C. Then for n > N, 
\n(AnT- n B)-»(AMB)\ 

= \n(AnT~ n B) ± /i(4nr n B) ± /i(Inr n B) ± n(A)n(B) ±^{A)^(b) - h{A)h{b)\ 

< 2(i(B \ B) + 2fx(A \ A) + |/i(A n T~ n B) - h(A)/m(B) \ 

< 5e, 

where the first inequality follows from the triangle inequality, and the second follows from 
the previous two displays. As A, B £ C and e > were arbitrary, Theorem Q] follows. 

A similar argument can be used to show that any weak mixing transformation satisfies 
uniform convergence over countable VC classes. A measure preserving transformation T is 
weak mixing if given measurable sets A and B, 

n-l 



lim - V \fi(A n T~ { B) - ii(A)n(B)\ = 0. 



n— >oo n 

Proposition 2. If C is a countable VC-class of measurable sets and T is a weakly mixing 
transformation, then 

n-l 
"-"x'A.B'eC n 



-. n— 1 
lim sup ~y^\fi(AnT- i B)-ii(A)n(B)\=Q. 



8=0 

3 Proof of Theorem [I] 

The proof of Theorem Q] follows arguments used in pQ to establish uniform laws of large 
numbers for VC classes under ergodic sampling, and we make use of several auxiliary results 
from that paper in what follows. 

3.1 Joins and the VC dimension 

Definition: The join of k sets A\, . . . , A^ C [0, 1], denoted J = Vj=i ^-ii ls the partition 
consisting of all non-empty intersections A\ PI • • • fl A\. where A{ € {Ai, Af\ for i = 1, . . . , k. 

Note that J is a finite partition of [0, 1]. The join of Ai, . . . , Aj. is said to be full if it 
has (maximal) cardinality 2 k . The next Lemma (see Oil]) makes an elementary connection 
between full joins and the VC dimension. 

Lemma 1. Let C be any collection of subsets of X . If for some k > 1 there exists a 
collection Cq C C of 2 sets having a full join, then VC-dim(C) > k. 



The proof given here establishes that the approximating partition it is measurable a(C). 
A simple counterexample shows that it is not sufficient for the elements of tt to belong 
to U^Li °"(Cl)C2) • • • i C n ). To see this, let X = [0,1] and let A be Lebesgue measure. 
Let 01,02,... > be a sequence of numbers such that s = Y^=i a n < 1- Let s n = 
YH=\ a i for n > 1 and let s$ = 0. Define C n = [s n _i,s n ) for n > 1. Clearly, the VC- 
dimension of the class {Ci, C2, . . .} equals 1, since its constituent sets are disjoint. Define 
Jn = C\ V C2 V . . . V C n . Then A n = [s n , 1] is a single element in J n with measure 
1 — s n > 1 — s > 0. Moreover, both A n n C n+ \ and A n n C' n+1 have positive measure, so 
that /u(<9(C ra+ i : J n )) > 1 — s for n > 1. 

3.2 Reduction to the Unit Interval 

Fix a probability measure /i on (X,S) and let C C S have finite VC dimension. It follows 
from standard results on the L p -covering numbers of VC classes (c.f. Theorem 2.6.4 of |9j) 
that there exists a countable sub-family Co of C such that 

inf fifC'AC) = 

C'eCo 

for each C € C. An elementary argument then shows that, for every finite partition tt, 

sup /i((9(C : 7r)) = sup fi{d(C : tt)), 
c&c Cec 

and we may therefore assume that C is countable. Let Xq = {x : fj,({x}) > 0} be the set of 

atoms of \i and let /j>q(A) = fi(A n ^0) be the atomic component of //. As Xq is countable, 

it is easy to see that 

inf sup/io(<9(C : 7r)) = 0, 
^en cec 

and we may therefore assume that [i is non-atomic. 

Following the proof in [lj, we make two further reductions. Let A(-) be Lebesgue mea- 
sure on the unit interval [0, 1] equipped with its Borel subsets B. Using the existence of a 
measure-preserving isomorphism between (X,S,/i) and ([0, 1],B, A) (c.f. [8]) a straightfor- 
ward argument ensures that we lose no generality in assuming that X = [0,1], /i = A, and 
that C C B is a countable family with finite VC dimension. Using an additional isomor- 
phism described in Lemma 6 of [lj we may further assume that each element of C is a finite 
union of intervals. 

Based on the reductions above, Theorem Q] is a corollary of the following result. 



Theorem 3. Let C C B be a countable VC class, each of whose elements is a finite union 
of intervals. For every e > there exists a finite partition of [0, 1] such that 

sup \[d[C : n)) < e. 
CeC 

Remark: The proof of Theorem [3] follows the proof of Proposition 3 from [lj. Beginning 
with the assumption that the conclusion of the theorem is false, we construct, in a step-wise 
fashion, a sequence of "splitting sets" Ri,R2,... C [0,1] from the sets in C. At the A;th 
stage the splitting set Rk is obtained from a sequential procedure that makes use of the 
splitting sets Hi, ... , Rk-i produced at previous stages. The splitting sets are then used to 
identify finite, but arbitrarily large, collections of sets in C having full join. The existence 
of these collections implies that C has infinite VC dimension by Lemma [TJ 

Proof of Theorem [3l Suppose to the contrary that there exists an rj > such that 

sup X(d(C : it)) > rj for every it G II. (3) 

CeC 

For n > 1 let V n = {[k2~ n , (k + 1) 2~ n ] : < k < 2 n - 1} be the set of closed dyadic 
intervals of order n. 

Stage 1. Let Ci(l) be any set in C. Suppose that sets Ci(l), . . . , C\{n) G C have already 
been selected, and let J\(n) = V n V Ci(l) V ■ ■ ■ V C\{n). It follows from ([3]) that there is a 
set C\{n + 1) G C such that G\{n) = d(C\(n + 1) : J\{n)) has measure greater than r\. Let 
3\{n + 1) = V ri+ \ V C\ V ■ ■ ■ V C n+ i and continue in the same fashion. The sets {Gi(n)} are 
naturally associated with a tight family of sub-probability measures {A n (-) = A(-nGi(n))}. 
There is therefore a subsequence {A ni ( r )} that converges weakly to a sub-probability v\ on 
([0, 1],S). It is easy to see that u\ is absolutely continuous with respect to A and that 

v x ([0,1]) > limsupA nr ([0,l]) > r). 

r— >oo 

The Radon-Nikodym derivative dv\jd\ is well defined, and is bounded above by 1. Define 
the splitting set R\ = {x : {dv\ / dX){x) > rj/2}. From the previous remarks it follows that 

V < ^i([0,l]) = [ ^-d\ < [ ld\+ [ 7]/2d\ < A(i?i) + 7?/2, (4) 

JO « A JRi JRf 

and therefore X(Ri) > rj/2. 

Subsequent stages. In order to construct the splitting set Rk at stage k, let C/%(1) be 
any element of C, and suppose that Cfc(2), . . . , Ck(n) have already been selected. Define the 



join 

fe— 1 n 

J k (n) = v n y\J R,y\l C k {i). (5) 

j=l i=\ 

By © there exists a set Ck(n + 1) £ C such that Gk(n) = d(Ck(n + 1 : Jk(n)) has measure 
greater than 77. This process continues as in stage 1. As before, there is a sequence of 
integers rifc(l) < rifc(2) < • • • such that the measures X(B(lGk(nk(r))) converge weakly to a 
sub-probability measure Uk on ([0,1], /3) that is absolutely continuous with respect to A(-). 
Define Rk = {x : {dvk/dX){x) > 5}. 

Construction of Full Joins. Fix an integer L > 2. As the measures of the sets Rk 
are bounded away from zero, there exist positive integers k\ < &2 < • • • < ^L such that 
^(rii=i ^fej) > 0- Suppose without loss of generality that kj = j, and define the intersections 



Qr= f]R. 



L-r 

J 



for r = 0, 1, . . . ,L — 1. Note that Qo != Qi ^ • • • ^ Ql-\- We show that there exist sets 
D\, D2, ■ ■ ■ , -Dl-i € C such that, for I = 1, . . . , L — 1, 

(i) the join if/ = L>i V D 2 V • • • V D t has cardinality \Ki\ = 2 l , and 

(ii) B° n Qi is non-empty for each B £ Ki, where B° denotes the interior of B. 

We proceed by induction, beginning with the case 1 = 1. Let x\ be a Lebesgue point of Qo, 
and let e = 77/2(77+2). Then there exists oc\ > such that the interval I\ = [x\—a\,xi+a\) 
satisfies 

A(IinQo) > (l-e)A(Ji) = 2ai(l-e). (6) 

It follows from the last display and the definition of Rl 5 Qo that 

MhnR L ) = f ^d\ > f^A(Iin^) > ai(l-e)77. (7) 

JhnR L dA l 

Let {ni{r) : r > 1} be the subsequence used to define the sub-probability vl. As I\ is an 
open set, the portmanteau theorem and ([7]) imply that 

liminfA(/in G L (n L (r))) > v L (I x ) > u L (hnR L ) > ai (l - e) V . 

r— >oo 

Choose r sufficiently large so that X(I\ n Gi(ni(r))) > «i(l — e)?7 and 2 _nL ( r ) < 7701/8. 
We require the following lemma from pp. 



Lemma 2. There exists a cell A of Ji(ni(r)) such that A C d(CL(nL(r) + 1) : Jl(^l(^)); 
AC/! and X(A n Qi) > 0. Moreover, A is contained in Q\. 

Let -Di = CL(riL(r) + 1) G C, and let A be the set identified in Lemma [2j By definition 
of the boundary, X(A f] -Di) > and A(A n -DJ) > and therefore A(Qi n D\) > and 
A(Qi fl Df) > as well. As the Lebesgue measure of the boundary Di \ D° of Di is zero, 
assertion (ii) above follows. 

Suppose now that we have identified sets D\, . . . , Di G C, with I < L — 2, such that (i) 
and (ii) hold. Let the join Ki = {Bj : 1 < j < 2 1 }, and for each j let Xj S B? DQi. Select 
cti + \ > such that for each j the interval Ij = (xj — ai + i,Xj + a/+i) is contained in B? and 
satisfies 

XiljHQi) > (l-e)A(/ i ) = 2a J+ i(l-e). 

To simplify notation, let k = L — I. Let {n K (r) : r > 1} be the subsequence used to define 
the sub-probability v K . For each interval Ij, 

liminfA^nG^Ktr))) > v K (Ij) > u K {Ij D R K ) > a l+1 (l - e)rj, 

r— >-oo 

where the last inequality follows from the previous display, and the fact that Qi C R K . 
Choose r sufficiently large so that X(Ij nG K (n K (r))) > 07+1(1 — e)r) for each j, and 2~ nK ( r > < 
Tjai+x/8. 

By applying the Lemma [2] to each interval h , one may establish the existence of sets 
Aj E d(C K {n K (r) + l) : J K {n K {r)) such that Aj- C Ij C B?, A(A/nQ J+ i) > 0, and A,- C Q, +1 . 
Let D1+1 = C K (n K (r) + 1) € C. Arguments like those for the case Z = 1 above show that 
for each j the intersections Aj n -D°+i and Aj n (Df +1 )° are non-empty, and the inductive 
step is complete. Given any two dyadic intervals, they are disjoint, intersect at one point, 
or one contains the other. Therefore, among the sets D\, . . . , Z?l_i, at most one can be a 
dyadic interval; the remainder are contained in C. 
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